RBONN: Recurrent Bilinear Optimization for a Binary Neural Network

79

FIGURE 3.25

Evolution of the binarized values, |x|s, during the XNOR and BONN training process. They

are both based on WRN-22 (2nd, 3rd, 8th, and 14th convolutional layers), and the curves

do not share the same y-axis. The binarized values of XNOR-Net tend to converge to small

and similar values, but these of BONN are learned diversely.

a learning rate schedule that decreases to 10% every 30 epochs. As shown in Table 3.6, our

Bayesian feature loss can further boost the performance of models with real values by a clear

margin. Specifically, our method promotes the performance of ResNet-18 and ResNet-50 by

0.6% and 0.4% Top-1 accuracies, respectively.

3.8

RBONN: Recurrent Bilinear Optimization for a Binary Neural

Network

We first briefly introduce the bilinear models in deep learning. Under certain circumstances,

bilinear models can be used in CNNs. An important application, network pruning, is among

the hottest topics in the deep learning community [142, 162]. Vital feature maps and related

channels are pruned using bilinear models [162]. Iterative methods, e.g., the Fast Iterative

Shrinkage-Thresholding Algorithm (FISTA) [141] and the Accelerated Proximal Gradient

(APG) [97] can be used to prune bilinear-based networks. Many deep learning applications,

such as fine-grained categorization [146, 133], visual question answering (VQA) [278], and

person re-identification [214], are promoted by embedding bilinear models into CNNs, which

model pairwise feature interactions and fuse multiple features with attention.

Previous methods [77, 148] compute scaling factors by approximating the weight filter

with real value w such that wαbw, where αR+ is the scaling factor (vector) and bw =

sign(w) to enhance the representation capability of BNNs. In essence, the approximation

TABLE 3.6

Effect of Bayesian feature loss on the ImageNet

data set. The core is ResNet-18 and ResNet-50

with real value.

Model

ResNet-18

ResNet-50

Bayesian feature loss









Accuracy

Top-1

69.3

69.9

76.6

77.0

Top-5

89.2

89.8

92.4

92.7